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Abstract: A test of a new Bayesian approach to solar flare prediction (Wheatland 2004a) is presented. 
The approach uses the past history of flaring together with phenomenological rules of flare statistics 
to make a prediction for the probability of occurrence of a large flare within an interval of time, or 
to refine an initial prediction (which may incorporate other information). The test of the method is 
based on data from the Geostationary Observational Environmental Satellites (GOES), and involves 
whole-Sun prediction of soft X-ray flares for 1976-2003. The results show that the method somewhat 
over-predicts the probability of all events above a moderate size, but performs well in predicting large 
events. 
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1 Introduction 

The space weather effects of large solar flares motivate 
flare prediction. For example, the soft X-ray flux due 
to large flares causes increased ionisation of the upper 
atmosphere, which can interfere with high frequency 
radio communication. There is considerable interest in 
knowing when such short-wave fadeouts are likely to 
occur, and Australia's Ionospheric Prediction Service 
(IPS) issues predictions on this basis. 1 Other agen- 
cies issuing flare predictions include the US National 
Oceanic and Atmospheric Administration (NOAA) 2 
and NASA 3 . 

Existing methods of prediction are probabilistic, 
and rely e.g. on the classification of physical character- 
istics of active regions, and historical rates of flaring 
for regions with a given classification (Mcintosh 1990; 
Bornmann & Shaw 1994). One weakness of classifi- 
cation based approaches is that regions with a given 
classification may exhibit a wide range of flaring rates. 
The method of Mcintosh (1990) also considers other 
information including the number of large flares al- 
ready produced by an active region (the tendency of an 
active region which has produced large events to sub- 
sequently produce large events is called persistence), 
but this is done in an ad hoc way. No consideration is 
given to the important information in the number of 
small events already observed. 

A new approach to flare prediction (Wheatland 
2004a) exploits the history of observed flaring together 
with simple phenomenological rules of flare statistics 
to make a prediction, or to refine an existing predic- 
tion. The basic method is as follows. It is well known 
that the size distribution of flares (e.g. the distribu- 
tion of peak soft X-ray flux) follows a power law (e.g. 
Crosby, Aschwanden & Dennis 1993): 



N(S) = A 1 ( 7 -1)S7- 1 S- 



(1) 



-^see lhttpV/w ww.ips.g ov.aul 

2 see http:/ /www. sec. noaa.gov/ftpdir/latest/daypre.txt 



where N(S) is the number of events per unit size S 
and per unit time, Ai is the total rate of events above 
size Si, and 7 is the power-law index. Suppose we are 
interested in the probability of a large event (S > S2) 
occurring in a time AT. The expected rate of events 
above S2 is, according to Equation JjJ 



A2 = Ai 



7-1 



(2) 



Flare occurrence may be described on short timescales 
as a Poisson process in time (e.g. Moon et al. 2001), 
and on longer timescales as a time-dependent Poisson 
process (e.g. Wheatland 2001). According to Poisson 
statistics, the probability of at least one large event in 
time AT is 

e = l-exp(-A 2 AT). (3) 

To apply these formulae it is necessary to estimate Ai 
and 7 from data, and hence to estimate e. We adopt 
a Bayesian approach, in which 'estimating' a parame- 
ter means calculating a posterior probability distribu- 
tion for the parameter, given available data and any 
prior information (e.g. Jaynes 2003). We assume that 
a a sequence of M events with sizes si, S2, sm (all 
larger than Si) have been observed to occur at times 
ti < t-2 < ... < tu respectively. The power-law index 
7 may be approximated by the maximum likelihood 
value (Bai 1993) 



7 = 



M 

hl7T 



+ 1, where 



n 



£1 

Si 



(4) 



3 see http:/ /beauty.nascom. nasa.gov/arm/latest/ 



To estimate the rate we adopt a piecewise constant 
Poisson model. Hence we need to identify the most 
recent interval T 1 during which the rate is constant, 
and we assume that M' < M events occurred during 
that time. One approach to the determination of the 
interval T' is to use the 'Bayesian blocks' procedure 
(Scargle 1998), which is discussed in more detail be- 
low. Based on M' , T' and 7*, the posterior probability 
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distribution for e is (Wheatland 2004a) 

P(e) = C{-\n(l-e)] M ' (l-e)^'^^ 3 ^ 3 ^*"- 1 

where A(Ai) is the prior distribution for Ai, i.e. the 
distribution we would assign to Ai in the absence of 
any data, and C is the normalisation constant, deter- 
mined by the requirement J Q P(e)de — 1. The mean of 
■P(e) provides the estimate of the probability of at least 
one large event within time AT, and the standard de- 
viation of the distribution provides an estimate of the 
associated uncertainty. 

2 The test 

As a basic test of the new approach to prediction, 
Equation © was applied to the NOAA Solar Event 
Lists of X-ray flares observed by the Geostationary Ob- 
servational Environmental Satellites (GOES) for 1975- 
2003. 4 For each day, whole-Sun flare prediction was 
performed for the next day (AT = 1 day) . 

The relevant measure of size, S, of the GOES events 
is the peak soft X-ray flux in the 1-8A band. The 
choice of threshold Si = 4 x 10" 6 W m 2 was made, 
based on inspection of the distribution of peak soft 
X-ray flux for the entire dataset. Figure 1 shows the 
distribution, plotted in differential form (upper panel) 
and cumulative form (lower panel). The threshold 
size Si is indicated by the vertical solid line. Events 
above this size are observed to be distributed approxi- 
mately as a power law, and the thick solid lines in each 
panel indicate the power-law model, with the maxi- 
mum likelihood power-law index 7* « 2.15 ±0.01. For 
peak fluxes below the threshold there is departure from 
power-law behaviour due to problems with event selec- 
tion against the time-varying soft X-ray background. 

Predictions were made for each day for events above 
size S2 = 10 _5 Wm -2 ('M class' and larger flares) 
and for events above size S2 = 10~ 4 Wm -2 ('X class' 
flares) using Equation JSj- The corresponding predic- 
tion probabilities for a given day are labelled em and 
ex respectively. It should be noted that these values 
are not independent, since X class flares are a subset 
of events above M class. 

The predictions used data within a window of time 
spanning one year prior to each day. For each day, 
Equation £IJ was applied to the one-year window of 
data prior to the day to determine 7* . Then the Bayesian 
blocks procedure was applied to the same data to de- 
termine a decomposition into a piecewise-constant Pois- 
son process. This procedure returns a sequence of 
times tso < tsi < ■•■ < isif at which the rate is de- 
termined to change (where tso and tsx are the start- 
and end-time of the window) , and a corresponding se- 
quence A_bi, A_B2, Ask of rates. The last Bayesian 
block was used to determine T' and M': viz. T' = 
tsK - t B (K-V) and M' = XbkT' . 

4 available from ftp://ftp.ngdc.noaa.gov 
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Figure 1: Upper panel: differential distribution of 
peak flux for GOES events 1976-2003 (histogram), 
and the power-law model distribution (thick line) . 
Lower panel: cumulative distribution for all events 
(joined points), and the model (thick line). In both 
panels the threshold Si for power-law bevaviour is 
shown by the vertical line. 



The data in every Bayesian block but the last was 
used to construct the prior A(Ai). A model form A(Ai) = 
aexp(— bXI) was chosen for the prior, and the parame- 
ters a, b, and c were determined by requiring that the 
first three moments of this model distribution match 
the first three moments of the data, estimated from 
the Bayesian blocks decomposition. Specifically we re- 
quired that the model distribution was normalised, and 
that it had a mean rate and mean square rate equal to 
the corresponding estimates from the Bayesian blocks. 
These three conditions uniquely determined values of 
a, b, and c. 

3 Results 

Figure 2 illustrates the results of the test on a year 
by year basis for 1976-2003 (the results for 1975 are 
omitted because the predictions are made using less 
than a year of previous data) . The upper panel shows 
the predictions for M class (and larger) flares. The 
histograms represent the observed number of days on 
which there was at least one M class flare (or larger). 
The diamonds represent the sum of the em values for 
all days within the given year, which is the predicted 
number of days on which there should be at least one 
event of M class or larger. The lower panel shows 
the same display, but for events of X class. Uncer- 
tainties are shown for the predicted values, based on 
summation of the individual prediction uncertainties 
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in quadrature. 

The upper panel of Figure 2 indicates that the val- 
ues of £m are systematically too large. More quanti- 
tatively, we find that the average value of em over all 
days (1976-2003) is 0.320, whereas the observed value 
(the fraction of days on which there was at least one 
M class event) is 0.264. The lower panel of Figure 2 
indicates that the method has done quite well in pre- 
dicting X class events. In fact the average value of ex 
for 1976-2003 is 0.040, whereas the observed value is 
0.036. 



in that it does not assign values larger than about 0.5. 
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Figure 2: Upper panel: observed/predicted 
numbers of M (or larger) event days (his- 
togram/diamonds). Lower panel: the same but 
for X event days. 



Figure 3 gives a more detailed display of the results 
of the test for all years (1976-2003) in the form of a pair 
of 'reliability plots', in which the horizontal axis shows 
the forecast probabilities em or ex for each day (in bins 
of 0.05), and the vertical axis shows the true probabil- 
ity for flaring on the given days, estimated from the 
observed number of event days. This is the Bayesian 
estimate assuming binomial statistics and a uniform 
prior: if there are R days with at least one event out 
of a total of S days, the estimate for the probability 
is p = (R + 1)/(S + 2) and the corresponding error is 
[p{l-p)/{S + 3)] 1/2 (e.g. Jaynes 2003, p. 165). Perfect 
prediction corresponds to the solid 45 degree line on 
the plot. The upper panel of Figure 3 is the reliability 
plot for M (and larger) event prediction, and the lower 
panel is the reliability plot for X event prediction. The 
upper panel confirms that the predictions for M class 
events are systematically too large, although it shows 
that the effect is only associated with days on which 
em is larger than about 0.25. The lower panel indicates 
that the predictions for X class flares are quite good for 
all values of ex, although the method is conservative, 




Figure 3: Upper panel: reliability plot for predic- 
tion of M events and above. Lower panel: the same 
but for X events. 



It is interesting to compare these results with pre- 
dictions made by the US National Oceanic and At- 
mospheric Administration. Statistics are available for 
NOAA predictions for 1987-2002 5 . The NOAA re- 
sults indicate a very serious over-prediction of X class 
events. During 1987-2002 the NOAA predictions im- 
ply that there should be 372.5 X event days, when 
in fact there are 200 such days. The present method 
predicts 233.8 X event days for 1987-2002, which is 
a considerable improvement over the NOAA result. 
For M (and larger) events, the NOAA results show 
over-prediction similar to the results obtained with 
the present method. A more detailed comparison with 
NOAA predictions will be presented in a future paper 
(Wheatland 2004b). 

4 Discussion 

A Bayesian approach to flare prediction (Wheatland 
2004a) has been tested for whole-Sun prediction of 
GOES soft X-ray events, based on the NOAA Solar 
Event Lists for 1975-2003. The method is found to 
over-predict events of M class and above, but performs 
quite well for prediction of X class events. 

There are several possible reasons for the over- 
prediction of events above M class. One possibility 
is that the method is systematically late in detecting 
the decline in rate associated with the decay of a large 
active region, or the rotation of a large active region 
off the disk. The method uses the Bayesian blocks pro- 
cedure to detect rate changes, and is always trying to 



5 See http:/ /www. noaa.sec.gov/verification/ 
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'catch up' with what the Sun is doing. A specific lim- 
itation of the Bayesian blocks method is that it must 
have at least one event in a block, so that the rate is 
never identically zero. This limitation leads to overes- 
timation of the rate at times of very low activity. It 
should also be noted that the existing Bayesian blocks 
procedure is not guaranteed to find an optimal decom- 
position — in future this procedure will be replaced 
by an optimal algorithm recently devised by Scargle 
(Scargle, private communication, 2003). Another pos- 
sibility is that there is a bias in each individual pre- 
diction which becomes apparent in the analysis of a 
large number of predictions. Such a bias will become 
less serious if each individual prediction is more accu- 
rate. The fractional error in each prediction goes as 
(M') _1/2 (Wheatland 2004a), where W is the num- 
ber of events associated with the estimation of the rate 
above size S\ . This error becomes smaller with increas- 
ing M' ', i.e. if a smaller Si can be chosen. It should 
be noted that the GOES event lists are a relatively 
poor choice for the present purpose because the time 
varying soft X-ray background means that a relatively 
large Si must be used (see Figure 1). 

The GOES event lists also have other shortcom- 
ings as a basis for prediction from event statistics. Be- 
sides the departure from a power law at small sizes, 
it is likely the lists are incomplete above the nominal 
threshold Si , e.g. due to the difficulty of distinguishing 
two flares occurring close together in time (Wheatland 
2001). This is unlikely to be important for the test de- 
scribed here, provided that the size distribution obeys 
a power law above the threshold, since both predic- 
tion and validation of the prediction rely on the same 
(possibly incomplete) lists. Another problem with the 
GOES event lists is that the GOES peak fluxes are not 
background subtracted, so that the intrinsic flux due 
to an M class event at a time of low activity, when the 
background is low, is larger than the intrinsic flux due 
to an M class event at a time of high activity, when 
the background is high. However, again this is not 
particularly important to the present method provided 
that the size distribution for observed events is not dis- 
torted from a power- law form by this effect. The use 
of a previous year of data to make a prediction allows 
the possibility of incorporating variation in the power- 
law index with the solar cycle (e.g. due to this effect) 
but in fact we find no evidence of such a variation. In 
subsequent work the method will be applied to more 
accurate event catalogs. 

The present test is limited to whole-Sun predic- 
tion. In the future the method will also be applied to 
individual active regions, and other prior information 
on the rate (e.g. the Mcintosh classification of the as- 
sociated sunspots) will be incorporated. However we 
note that, even in this simple form, the method has 
out-performed NOAA predictions for X class events. 

Finally we note that automated predictions, based 
on this method, are now published on the web. 6 Pre- 
dictions are made each day using the latest NOAA so- 
lar event lists. The web pages include a running score 
of how reliable the published predictions are, in the 



form of automatically updated reliability plots. 
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